Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition

Zenan Zhai; Dat Quoc Nguyen; Karin Verspoor

Conference Proceedings

Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition

Zenan Zhai, Dat Quoc Nguyen, Karin Verspoor

Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis | Association for Computational Linguistics | Published : 2018

DOI: 10.18653/v1/w18-5605

Abstract

We compare the use of LSTM-based and CNN-based character-level word embeddings in BiLSTM-CRF models to approach chemical and disease named entity recognition (NER) tasks. Empirical results over the BioCreative V CDR corpus show that the use of either type of character-level word embeddings in conjunction with the BiLSTM-CRF models leads to comparable state-of-the-art performance. However, the models using CNN-based character-level word embeddings have a computational performance advantage, increasing training time over word-based models by 25% while the LSTM-based character-level word embeddings more than double the required training time.

University of Melbourne Researchers

Karin Verspoor Author

Related Projects (1)

Natural language processing for automated validation of protein databases

The project aims to use natural language processing and information retrieval to reconcile and improve sources of biological information. Bi..

Grants

Awarded by Australian Research Council

Citation metrics

21Scopus

31Dimensions

Keywords

46 Information and Computing Sciences

4605 Data Management and Data Science

31 Biological Sciences